Search CORE

42 research outputs found

Parallel and sequential approximation of shortest superstrings

Author: J-S. Turner
J. Gallant
J. Tarhio
M. R. Garey
V. Chvatal
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

A framework for research on technology-enhanced special education

Author: Jormanainen Ilkka
Kärnä-Lin Eija
Lahti Lauri
Pihlainen-Bednarik Kaisa
Sutinen E.
Tarhio J.
Virnes M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Based on results from the Technologies for Childrenwith Individual Needs Project and two case projects,we propose a new multidisciplinary framework forresearch between computer science, educationaltechnology, and special education. The frameworkpresents a way to conduct research that aims atdeveloping new methods for technology-enhancedspecial education and for developing adaptablesoftware and hardware tools for individual needs ineducational settings.Peer reviewe

Crossref

Aaltodoc Publication Archive

Fast Searching in Packed Strings

Author: A. Amir
D.E. Knuth
E.W. Myers
G. Navarro
J. Tarhio
K. Fredriksson
K. Fredriksson
R. Baeza-Yates
R.A. Baeza-Yates
R.M. Karp
R.S. Boyer
S. Wu
S.T. Klein
T.A. Welch
V.L. Arlazarov
W. Masek
W. Rytter
Publication venue
Publication date: 01/01/2009
Field of study

Given strings

P

and

Q

the (exact) string matching problem is to find all positions of substrings in

Q

matching

P

. The classical Knuth-Morris-Pratt algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear time which is optimal if we can only read one character at the time. However, most strings are stored in a computer in a packed representation with several characters in a single word, giving us the opportunity to read multiple characters simultaneously. In this paper we study the worst-case complexity of string matching on strings given in packed representation. Let

m \leq n

be the lengths

P

and

Q

, respectively, and let

\sigma

denote the size of the alphabet. On a standard unit-cost word-RAM with logarithmic word size we present an algorithm using time O\left(\frac{n}{\log_\sigma n} + m + \occ\right). Here \occ is the number of occurrences of

P

Q

. For

m = o(n)

this improves the

O(n)

bound of the Knuth-Morris-Pratt algorithm. Furthermore, if

m = O(n/\log_\sigma n)

our algorithm is optimal since any algorithm must spend at least \Omega(\frac{(n+m)\log \sigma}{\log n} + \occ) = \Omega(\frac{n}{\log_\sigma n} + \occ) time to read the input and report all occurrences. The result is obtained by a novel automaton construction based on the Knuth-Morris-Pratt algorithm combined with a new compact representation of subautomata allowing an optimal tabulation-based simulation.Comment: To appear in Journal of Discrete Algorithms. Special Issue on CPM 200

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Crossref

Online Research Database In Technology

Approximate string matching with reduced alphabet

Author: B. Ďurian
E. Ukkonen
E. Ukkonen
E. Ukkonen
E. Ukkonen
E. Ukkonen
J. Kärkkäinen
J. Kärkkäinen
J. Tarhio
J. Tarhio
K. Fredriksson
K. Fredriksson
K. Fredriksson
L. Salmela
M. Fontaine
M.R. Garey
P. Jokinen
P. Jokinen
R. Baeza-Yates
R. Muth
R. Zhu
R.M. Karp
R.N. Horspool
R.S. Boyer
T. Berry
T. Lecroq
V. Mäkinen
V.L. Arlazarov
W.J. Masek
Z. Liu
Publication venue: Heidelberg, Berlin, Springer Verlag,
Publication date: 01/01/2010
Field of study

Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Bit-parallel search algorithms for long patterns

Author: A. Hume
A.C.-C. Yao
G. Navarro
G. Navarro
G. Zhang
H. Peltola
J. Tarhio
K. Fredriksson
L. He
M. Crochemore
M.O. Külekci
R.N. Horspool
T. Lecroq
Publication venue
Publication date: 01/01/2010
Field of study

Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

A Fast Algorithm for Approximate String Matching on Gene Sequences

Author: A. Cornish-Bowden
G. Navarro
G. Navarro
J. Tarhio
L. Valinsky
N. El-Mabrouk
R.A. Baeza-Yates
R.N. Horspool
R.S. Boyer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Crossref

Greedy Shortest Common Superstring Approximation in Compact Space

Author: A Zaritsky
E Ohlebusch
E Ukkonen
H Kaplan
HN Gabow
J Gallant
J Qin
J Tarhio
JS Turner
JT Simpson
S Gog
TH Cormen
V Mäkinen
Publication venue: Springer International Publishing AG
Publication date: 06/09/2017
Field of study

Given a set of strings, the shortest common superstring problem is to find the shortest possible string that contains all the input strings. The problem is NP-hard, but a lot of work has gone into designing approximation algorithms for solving the problem. We present the first time and space efficient implementation of the classic greedy heuristic which merges strings in decreasing order of overlap length. Our implementation works in O(n log σ) time and bits of space, where n is the total length of the input strings in characters, and σσ is the size of the alphabet. After index construction, a practical implementation of our algorithm uses roughly 5n log σ bits of space and reasonable time for a real dataset that consists of DNA fragments.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Comparing De Novo Genome Assembly: The Long and Short of It

Author: A Phillippy
B Mishra
B Schmidt
Bud Mishra
C Alkan
C Aston
D Bryant
D Hernandez
D Schwartz
D Sommer
DR Zerbino
DR Zerbino
EW Myers
F Sanger
FR Blattner
G Narzisi
GG Sutton
Giuseppe Narzisi
IT Paulsen
J Butler
J Tarhio
JC Dohm
JC Mullikin
JM Kidd
JR Miller
JT Simpson
M Antoniotti
M Eppinger
M Hossain
M Wu
MJ Chaisson
P Green
P Medvedev
PA Pevzner
PN Ariyaratne
R Li
RL Warren
RW Hung
S Batzoglou
S Boisvert
S Gnerre
S Kim
S Kurtz
SL Salzberg
SR Gill
SS Hall
Stein Aerts
T Anantharaman
T Baba
TS Anantharaman
TS Anantharaman
WR Jeck
X Huang
X Huang
Publication venue: Public Library of Science
Publication date: 29/04/2011
Field of study

Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers – both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies – are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing “next-generation” assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A method for automatically extracting infectious disease-related primers and probes from the literature

Author: A Loy
Alejandro Cuevas
BS Rice
D Betel
DA Benson
David Pérez-Rey
Diana de la Iglesia
EA Mothershed
F Li
F Pattyn
Fernando Martín-Sánchez
G De la Calle
Guillermo de la Calle
Guillermo López-Campos
H González-Díaz
H Hyyrö
HD VanGuilder
HP Lee
J Stajich
J Tamames
J Tarhio
JJ Rocchio
José Crespo
K Pabbaraju
L Hirschman
LL Cheng
LT Bravo
M Minsky
MB Miller
MC Enright
MG Campi
Miguel García-Remesal
National Center for Biotechnology Information
P Harmon
PC Woo
R McDonald
RM Ratcliff
SF Altschul
Victoria López-Alonso
Víctor Maojo
YC Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Excel as an Algorithm Animation Environment

Author: E. Rautama
E. Sutinen
J. Tarhio
Publication venue
Publication date
Field of study

Understanding of fundamental algorithms and designing algorithms for a novel problem are basic skills in Computer Science. Animation is a useful aid in both these areas. We show how to animate algorithms with Microsoft Excel using data visualization and macro programming features of Excel. The user writes an algorithm using the Visual Basic programming language of Excel and defines charts visualizing dynamically the data structures of the algorithm. This approach is suitable especially for small-scale animation, e.g. for course assignments in Computer Science. 1 Introduction Teaching algorithms is a problem in Computer Science education: most students regard them as abstract black boxes with an irrelevant content. For example, the idea behind a certain sorting method is not interesting for a student. However, usability and efficiency of an algorithm for a particular problem are important aspects to learn. To inspire students to get interested in algorithms is challenging. The idea of..

CiteSeerX